メインコンテンツにスキップ

Self Attention

Reference

Transformer Paper - Attention is all you need

Self Attention

讓整個Input Sequence中的每個元素都能夠關注到其他元素，這樣可以捕捉到序列中元素之間的長距離依賴關係。

Input and Output

Input: Sequence of vectors (e.g., word embeddings, one-hot encoding)
Output:

N -> model -> N (e.g. POS tagging)
N -> model -> 1 (e.g. classification)
N -> model -> N' (e.g. translation) = Sequence to Sequence(Seq2Seq)

說明

以下的資訊都是針對N -> model -> N' (Seq2Seq)的情況

Relevant

Dot-Product
Additive

Self-Attention Mechanism

Multi-Head Self Attention

Positional Encoding

Truncated Self Attention

Masked Self Attention

Reference
Self Attention
Multi-Head Self Attention
- Positional Encoding
Truncated Self Attention
Masked Self Attention